-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POC - Changes to use persistent recursive watch for PRS enabled collections #58
Open
patsonluk
wants to merge
26
commits into
master
Choose a base branch
from
patsonluk/zk-persistent-watch
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Conflicts: # solrmonitor/solrmonitor.go
…ow. as zkwatcherman does not really use the returned EventQueue
…does not use recursive watch would see less changes
Added unit test case to verify # of fetches
patsonluk
changed the title
Changes to use persistent recursive watch for PRS enabled collections
POC - Changes to use persistent recursive watch for PRS enabled collections
Jun 20, 2023
go: downloading github.com/patsonluk/zk v1.0.3-0.20230609180938-ecd7812130b6 smutil/util.go:23:2: //go:build comment without // +build comment
…e not fetching, otherwise later notification might be handled (callback) before the previous notification is handled
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
POC to assess scope of change/challenges to implement solrmonitor/solrman to make use of the recursive persistent watch.
Since Apache zk v3.6. A new class of watch is introduced which can be persistent and/or recursive.
Before such class of watch, if we want to keep watching a particular data node (as implemented in zkwatcherman for example), we would:
For watching children of a particular node (for example state.json for PRS), it's very similar:
EventNodeChildrenChanged
event, then we call childrenW() again to get the list of children paths and re-install the watchWith permanent watch, we will still NEED to fetch the data again, just that we can call
get()
instead ofgetW()
Now recursive watch HAS to be permanent as well. It can become a replacement of data + children watch, more importantly, it could reduce the number of fetch and amount of data tranferred. For example, if we want to watch the child change of
state.json
. And assume there's a new child entry added, and then a old child entry removed:state.json
, on first child entry addition, we will need to callchildrenW()
to get the full list of child paths, and then again on child deletion.Solution
For the zk goclient, there's an open PR (lacking reviews unfortunately) that implements it. Therefore we have cherry picked those changes into https://github.com/patsonluk/zk/commits/cherry-pick-persistent-watch
Now there are 2 major consumers of zk goclient:
This POC in particular focuses on replacing PRS monitoring in solrmonitor/zkwatcherman with permanent recursive watch, which has the best potential for performance improvement. (reduce fetches)
A new branch of gosolr is created https://github.com/fullstorydev/gosolr/tree/patsonluk/zk-persistent-watch that modifies the collections state/PRS monitoring. For each collection, instead of adding a data watch + children watch on state.json. it will only install a single permanent recursive watch. And instead of fetching full child path and updating the collections state with all the entries, it will use the path provided in the zkEvent (so no extra fetching) and update the collection state with it.
solrmonitor should no longer rely on
ChildrenChanged
(triggered byzk.EventNodeChildrenChanged
) for PRS updates, instead it will rely onDataChanged
(triggered byzk.EventNodeCreated
andzk.EventNodeDeleted
) to be notified of PRS entry changes. There's also a newShouldFetchData
function, which would return false for PRS entry, this signals zkwatcherman to NOT fetch the data on zk.EventNodeCreated and zk.EventNodeDeleted as we only care about the path for PRS.Also zkwatcherman needs special handling for recursive monitoring. For non-recursive monitoring, zkwatcherman always fetches the path (whether
get
andchildren
) and knows the exact path to fetch. However, for recursive monitoring, it will need to recursively fetch all the children. This however, can be costly (n extra children fetch, which n is # of PRS entries), in the new functionMonitorDataRecursive
, a parameterinitFetchDepth
is added to indicate how deep should the children be fetched for init.To avoid changes to the higher level
SolrEventListener
, we will still maintain the existingSolrCollectionReplicaStatesChanged(name string, replicaStates map[string]*PerReplicaState)
function even for single PRS entry changes (not any worse than before, as n child changes should result as n childrenWatch events before)Tests
Added extra test into the existing
TestPRSProtocol
:Added a new case to ensure fetched collection/PRS state is correct on existing collection (before solrmonitor starts monitoring it). This is critical as the initial fetch logic on persistent watch is tricky (need to recursively fetch path)unnecessary, found that another testTestCollectionChanges
is already checking thatTested locally with both solrman and solrmonitor referencing the new gosolr hash on https://github.com/fullstorydev/gosolr/commits/patsonluk/zk-persistent-watch . Test operations such as split to ensure it works.
Also wanted to test on solrmonitor (the gosolr) one with
--flaky
flag, however, it turns out that such flag is only used post init:gosolr/solrmonitor/main/solrmonitor/solrmonitor.go
Line 90 in 25ddda1
TODO
This POC focuses only on replacing PRS related zk ops. Therefore only
zkwatcherman.MonitorData
has been modified (to support recursive watching). Other operations such as cluster props or children watch on collections/live nodes remain the same. Take note thatzkwatcherman.MonitorChildren
is also unchanged - it's still a one-time watch.This assumes ALL watches on
zkwatcherman.MonitorData
are persistent, it is tricky to allow a switch as based on whether the watch is persistent, thezkwatcherman.Callbacks
provided by the caller (in this case by solrmonitor) might need to be changed (for example ShouldWatchChildren and ShouldWatchData) as well. Keeping it simple for POC. Update see #59 for extra changes required to support such switch